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1. Introduction 



Moderate deviations date back to Cramer (1938) who obtained expansions for 
tail probabilities for sums of independent random variables about the nor- 
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mal distribution. For independent and identically distributed random variables 
X u --- ,X n with EX t = and Var(X;) = 1 such that Ee ta \ Xl \ < c < oo for 
some t > 0, it follows from Petrov (1975), Chapter 8, Theorem 1 that 



for < x < n 1 / 6 where W n = {X\ + • • • + X n )/^/n, $ is the standard normal 
distribution function, and O(l) depends on c and to. The range < x < n 1 / 6 
and the order of the error term 0(1)(1 + x 3 )/y/n are optimal. 

The proof of (|1.1[) depends on the conjugate method and a Berry-Esseen 
bound, while the classical proof of Berry-Esseen bound for independent random 
variables uses the Fourier transform. However, for dependent random variables, 
Stein's method performs much better than the method of Fourier transform. 
Stein's method was introduced by Charles Stein in 1972 and further developed 
by him in 1986. Extensive applications of Stein's method to obtain Bcrry-Esseen- 
type bounds for dependent random variables can be found in, for example, Dia- 
conis (1977), Baldi, Rinott and Stein (1989), Barbour (1990), Dcmbo and Rinott 
(1996), Goldstein and Reinert (1997), Chen and Shao (2004), Chatterjee (2008), 
and Nourdin and Peccati (2009). Recent applications to concentration of mea- 
sures and large deviations can be found in, for example, Chatterjee (2007) and 
Chatterjee and Dey (2010). Expositions of Stein's method and its applications 
in normal and other distributional approximations can be found in Diaconis and 
Holmes (2004) and Barbour and Chen (2005). 

In this paper we apply Stein's method to obtain a Cramer-type moderate 
deviation result for dependent random variables whose dependence is defined in 
terms of an identity, called Stein identity, which plays a central role in Stein's 
method. A corollary for zero-bias coupling is deduced. The result is then applied 
to a combinatorial central limit theorem, the anti-voter model, a general system 
of binary codes, and the Curie- Weiss model. The bounds obtained in these ex- 




(1.1) 
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amplcs are as in and therefore may be optimal (sec Rcmark l4.1j) . It is noted 
that Raic (2007) also used Stein's method to obtain moderate deviation results 
for dependent random variables. However, the dependence structure considered 
by him is related to local dependence and is of a different nature from what we 
assume through the Stein identity. 

This paper is organized as follows. Section 2 is devoted to a description of 
Stein's method and to the construction of Stein identities using zero-bias cou- 
pling and exchangeable pairs. Section 3 presents a general Cramer-type mod- 
erate deviation result and a corollary for zero-bias coupling. The result is ap- 
plied to the four examples mentioned above in Section 4. Although the general 
Cramer-type moderate deviation result cannot be applied to unbounded inde- 
pendent random variables, the proof of the general result can be adapted to 
prove (jl.lj) under less stringent conditions thereby extending a result of Linnik 
(1961). These are also presented in Section 4. The rest of the paper is devoted 
to proofs. 

2. Stein's method and Stein identity 

Let W be the random variable of interest and Z be another random variable. 
In approximating C(W) by C{Z) using Stein's method, the difference between 
Eh(W) and Eh{Z) for a class of functions h is expressed as: 

Eh{W) - Eh{Z) = E{Lf h (W)} (2.2) 

where L is a linear operator and fa a bounded solution of the equation L f = 
h - Eh(Z). It is known that for N(0, 1), Lf(w) = f'(w) - wf(w) (sec Stein 
(1972)) and for Poisson(A), Lf(w) = Xf(w + 1) - wf(w) (see Chen (1975)). 
However L is not unique. For example, for normal approximation L can also be 
the generator of the Ornstein-Uhlenbeck process and for Poisson approximation 
L the generator of an immigration-death process. The solution fh will then be 
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expressed in terms of a Markov process. This generator approach to Stein's 
method is due to Barbour (1988 and 1990). 

By ([22]), bounding Eh{W) - Eh{Z) is equivalent to bounding E{Lfa(W)}. 
To bound the latter one finds another operator L such that E{Lf(W)} = for 
a class of functions / including fa and write L = L — R for a suitable operator 
R. The error term E{Lff 1 (W)} is then expressed as ERfi l (W). The equation 

E{Lf(W)} = (2.3) 

for a class of functions / including fa is called a Stein identity for C(W). For nor- 
mal approximation there arc four methods for constructing a Stein identity: the 
direct method (Stein (1972)), zero-bias coupling (Goldstein and Reinert (1997) 
and Goldstein (2005)), exchangeable pairs (Stein (1986)), and Stein coupling 
(Chen and Rollin (2010)). We discuss below the construction of Stein identities 
using zero-bias coupling and exchangeable pairs. As proved in Goldstein and 
Reinert (1997), for W with EW = and Vax(W) = 1, there always exists W* 
such that 

EWf(W) = Ef'(W*) (2.4) 

for all bounded absolutely continuous / with bounded derivative /'. The dis- 
tribution of W* is called VF-zero-biased. If W and W* arc defined on the same 
probability space (zero-bias coupling), we may write A = W* — W. Then by 
(|2.4p . we obtain the Stein identity 

/oo 
f'(W + t)d(i(t\W), (2.5) 
-oo 

where fj,(-\ W) is the conditional distribution of A given W. Here L(w) = f'(w+ 
t)dn{t\W = w) -wf(w). 

The method of exchangeable pairs (Stein (1986)) consists of constructing W' 
such that (W 7 W) is exchangeable. Then for any anti-symmetric function F(-, •), 
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that is, F(w,w') = —F(w',w), 

EF(W, W) = 

if the expectation exists. Suppose that there exist a constant A (0 < A < 1) and 
a random variable R such that 

E(W- W | W) = \{W-E{R\W)). (2.6) 

Then for all / 

E{(W-W')(f(W) + f(W'))}=0 
provided the expectation exists. This gives the Stein identity 

EWf{W) = ~E{(W-W')(f(W')-f(W))} + E(Rf(W)) 

/OO 
f'(W + t)K(t)dt + E(Rf(W)) (2.7) 
-oo 

for all absolutely continuous functions / for which expectations exist, where 
K(t) = jxA(I(0 < t < A) - J(A < t < 0)) and A = W' - W. In this case, 
L{w) = f'(w + t)E{K{t)\W = w)dt + E(R\W = w)f(w) - wf(w). 
Both Stein identities (|2.5[) and (|2.7[) are special cases of 

/>oo 

EWf{W) = E f'(W + t)dp,(t) + E(Rf{W)) (2.8) 



where fx is a random measure. We will prove a moderate deviation result by 
assuming that W satisfies the Stein identity 



3. A Cramer-type moderate deviation theorem 

Let W be a random variable of interest. Assume that there exist a deterministic 
positive constant 5, a random positive measure /t with support [—5, 6] and a 
random variable R such that 

EWJ{W) = e( f'(W + t)dm+E(Rf(W)) (3.1) 

J\t\<5 
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for all absolutely continuous function / for which the expectation of either side 
exists. Let 



D = / djx{t). (3.2) 

J\t\<6 

Theorem 3.1. Suppose that there exist constants 81,82 and 6 > 1 such that 

\E{D\W)-l\<8 x {l + \W\), (3.3) 

\E{R\W)\ < 8 2 (l + \W\) or \E(R\W)\ < 8 2 {l + W 2 ) & 8 2 \W\ < a < 1 (3.4) 
and 

E(D\W) < 9 (3.5) 

Then 

P } W M T = 1 + O^O- + x 3 W + + 82) (3.6) 
forO <x< O^ 1 min(J -1 / 3 , 8 1 1 ^ 3 ,8 2 1 ^ 3 ), where O a (l) denotes a quantity whose 
absolute value is bounded by a universal constant which depends on a only under 
the second alternative of \3.4\ ■ 

Remark 3.1. Theorem lS. 1\ is intended for bounded random variables but with 
very general dependence assumptions. For this reason, the support of the random 
measure ft is assumed to be within [—8, 8] where 8 is typically of the order of 
1/y/n due to standardization. In order for the normal approximation to work, 
E(D\W) should be close to 1 and E(R\W) small. This is reflected in 81 and 82 
which are assumed to be small. 



For zero-bias coupling, D = 1 and R = 0, so conditions (|3.3j) . (|3.4j) and (|3.5p 
are satisfied with 81 = 82 = and 9 = 1. Therefore, we have 

Corollary 3.1. Let W and W* be defined on the same probability space sat- 
isfying Pm >. Assume that EW = 0, EW 2 = 1 and \W — W*\ < 8 for some 
constant 8. Then 

= 1 + 0(1)(1 + x 3 )S 



1 - 
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for <x< (T 1 / 3 . 

Remark 3.2. For an exchangeable pair (W, W) satisfying and |A| < 5, 

\3.1\ is satisfied with D = A 2 /(2A). 

Remark 3.3. Although one cannot apply Theorem \3.1\ directly to unbounded 
random variables, one can adapt the proof of Theorem \3.1\ to give a proof of 
(j 1.1\ for independent random variables assuming the existence of the moment 
generating functions oflXil 1 ^ 2 thereby extending a result of Linnik (1961). This 
result is given in Proposition \4-6\ The proof also suggests the possibility of ex- 
tending Theorem \3.1\ to the case where the support of (i may not be bounded. 

4. Applications 

In this section we apply Theorem 13.11 to four cases of dependent random vari- 
ables, namely, a combinatorial central limit theorem, the anti-voter model on a 
complete graph, a general system of binary codes, and the Curie- Weiss model. 
The proofs of the results for the third and the fourth example will be given in 
the last section. At the end of this section, we will present a moderate deviation 
result for sums of independent random variables and the proof will also be given 
in the last section. 

4-1- Combinatorial central limit theorem 

Let be an array of real numbers satisfying <Xy = for all i and 

S™=i a ij = for all j. Set Co = max^ \aij\ and W = J2i=i a i-K(i)l a i where 7r 
is a uniform random permutation of {1,2,- •• ,n} and er 2 = .EQ[2ILi a i7T (i)) 2 - 
In Goldstein (2005) W is coupled with the zero-biased W* in such a way that 
|A| = \W* -W\< 8c /a. Therefore, by Corollary O with S = 8c /cr, we have 



P(W > x) 
1 - 



l + 0(l)(l + a; 3 )co/cr 



(4.1) 
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4-2. Anti-voter model on a complete graph 

Consider the anti-voter model on a complete graph with n vertices, 1, • • • , n, 
and (n — l)n/2 edges. Let Xi be a random variable taking value 1 or —1 at the 
vertex i, i = 1, ■ • ■ , n. 

Let X = (Xi,-- - ,X n ), where Xi takes values 1 or — 1. The anti- voter 
model in discrete time is described as the following Markov chain: in each step, 
uniformly pick a vertex / and an edge connecting it to J, and then change 
Xi to -Xj. Let U = Y,i =1 Xi and W = U/a, where a 2 = Var(U). Let 
W' = (U — Xi — Xj)/a, where / is uniformly distributed on {1,2, ...,n} in- 
dependent of other random variables. Consider the case where the distribution 
of X is the stationary distribution. Then as shown in Rinott and Rotar (1997), 
(W, W) is an exchangeable pair and 



According to (|2.7j) . (|3.ip is satisfied with 5 = 2/ a and R = 0. To check conditions 
(|3.3[) and p.5[) . let T denote the number of l's among X\, ■ ■ ■ ,X n , a be the 
number of edges connecting two l's, b be the number of edges connecting two 
— l's, and c be the number of edges connecting 1 and —1. Since it is a complete 



graph, a = T ^ T 2 1 - ) , b = — T ^ T ^ . Therefore (see, for example, Rinott and 
Rotar (1997)) 



E(W- W'\W) = - 



W. 



(4.2) 



n 



E[{W -W'f\X] 




1 2U 2 + 2n 2 - An 2a 2 W 2 + 2n 2 - An 



(4.3) 



<7 



.2 



n(n — 1) 



2 n(n — 1) 



E{D\W) - 1 



^E((W -W) 2 \W)-l 

W 2 2a 2 [n- 1) - [n 2 - 2n) 



(4.4) 



2(n-l) 2a 2 (n-l) 
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Noting that E(E(D\W) - 1) = and EW 2 = 1, we have a 2 = Hence, 

W)-i = ^ry-^ry (4-5) 

which means that p.3[) is satisfied with S± = 0(n _1//2 ). Thus, we have the 
following moderate deviation result. 

Proposition 4.1. We have 

P(W > x) 



1 - $(x) 

/or < x < n 1 / 6 . 



= l + 0{l)(l + x 3 )/V^ 



4-3. A general system of binary codes 

In Chen, Hwang and Zacharovas (2011), a general system of binary codes is 
defined as follows. Suppose each nonnegative integer x is coded by a binary 
string consisting of 0's and l's. Let S(x) denote the number of l's in the resulting 
coding string of x and let 

S = (5(0), 5(1),...)- (4.6) 

For each nonnegative integer n, define S n = S(X), where X is a random integer 
uniformly distributed over the set {0, l,...,n}. The general system of binary 
codes introduced by Chen, Hwang and Zacharovas (2011) is one in which 

52m-i = 5 m _i +1 in distribution for all m > 1, (4-7) 

where I is an independent Bernoulli(l/2) random variable. Chen, Hwang and 
Zacharovas (2011) proved the asymptotic normality of S n . Here, we apply The- 
orem [XT] to obtain the following Cramer moderate deviation result. For n > 1, 
let integer k be such that 2 fe " 1 - 1 < n < 2 k - 1, and let W n = (5„-fc/2)/ y/k/4. 
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Proposition 4.2. Under the assumption j4.7[ >, we have 
for 0<x< k 1 / 6 . 

As an example of this system of binary codes, we consider the binary ex- 
pansion of a random integer X uniformly distributed over {0,1,..., n}. For 
2 fc-i _ i < n < 2 k - i, wr ite X as 

k 

i=i 

and let S n = X\ + ■ ■ ■ X k . Set W n = (S n - k/2)/^k/A. It is easy to verify 
that S„ satisfies f|4. 7[) . A Berry-Esseen bound for W n was first obtained by 
Diaconis (1977). Proposition l4.2| provides a Cramer moderate deviation result for 
W n - Other examples of this system of binary codes include the binary reflected 
Gray code and a coding system using translation and complementation. Detailed 
descriptions of these codes are given in Chen, Hwang and Zacharovas (2011). 

4-4- Curie- Weiss model 

Consider the Curie- Weiss model for n spins E = (<xi,<72j"" i a n) € { — 1 , 1 } rl . 
The joint distribution of S is given by 

l<i<j<n i= 1 

where Zp^ is the normalizing constant, and ft > 0, h £ TSL are called the inverse 
of temperature and the external field respectively. We are interested in the total 
magnetization S = Y^=i (J i- We divide the region /3 > 0, h S 1R into three 
parts, and for each part, we list the concentration property and the limiting 
distribution of S under proper standardization. Consider the solution(s) to the 
equation 

m = tanh(/3(m + h)). (4.9) 
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Case 1. < (3 < 1, h G K or f3 > 1, h ^ 0. There is a unique solution mo to 
(|4.9I) such that m$h > 0. In this case, S/n is concentrated around mo and 
has a Gaussian limit under proper standardization. 

Case 2. > 1, h = 0. There are two non-zero solutions to (|4.9j) . mi < < 7712- 
Given condition on S < (S* > respectively), S/n is concentrated around 
m i ( m 2 respectively) and has a Gaussian limit under proper standardiza- 
tion. 

Case 3. {3 — 1, h = 0. S/n is concentrated around but the limit distribution 
is not Gaussian. 

We refer to Ellis (1985) for the concentration of measure results, Ellis and 
Newman (1978a, 1978b) for the results on limiting distributions. See also Chat- 
terjee and Shao (2011) for a Berry-Esseen type bound when the limiting dis- 
tribution is not Gaussian. Here we focus on the Gaussian case and prove the 
following two Cramer moderate deviation results for Case 1 and Case 2. 

Proposition 4.3. In Case 1, define 

w= S-mno^ (4lQ) 



whe 



Then we have 



for0<x< n 1 / 6 . 



2 _ n(l - ml) 

~ 1- (1-77^/3 

P(W > x) 



(4.11) 



1 - $(x) 



l + 0(l)(l+x 3 )/^, (4.12) 



Proposition 4.4. In Case 2, define 



S — nm-\ S — nmo 
Wi = -, W 2 = (4.13) 



where 



n(l-rraf) _ 2 _ n(l - ml) 



CTl = l-(l-m?)^' a2= l-(l-m^- (414) 
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Then we have 

p( ' y ;!^, <0) = i + o(i)(i + ^. (4.15, 

' = l + 0(l)(l + x 3 )/^ (4.16) 



P(W 2 >x\S>0) , , , „ 



1 - $(a;) 



/or < x < n 1 / 6 . 



4-5. Independent random variables 

Moderate deviation for independent random variables has been extensively stud- 
ied in literature (see, for example, Petrov (1975), Chapter 8) based on the con- 
jugated method. Here, we will adapt the proof of Theorem 13.11 to prove the 
following moderate deviation result, which is a variant of those in the literature 
(see again Petrov (1975), Chapter 8). 

Proposition 4.5. Let 1 < i < n be independent random variables with 
E£i = and Ee tn ^*' < oo for some t n and for each 1 < i < n. Assume that 



E^ 2 = L ( 4 - 17 ) 

P(W > x) 



i=l 

Then 

= l + 0(l)(l + a; 3 )7e 4;l;37 (4.18) 



1 - <&(x) 

for < x < t n , where 7 = Ya=i E\^i\ 3 e x ^^. 

We deduce (jl.ip under less stringent conditions from Proposition 14.51 and 
extend a result of Linnik (1961) to independent but not necessarily identically 
distributed random variables. 
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Proposition 4.6. Let JQ, 1 < i < n be a sequence of independent random 
variables with EXi = 0. Put S n — J2"=i -^i and B\ = X)"=i EXf. Assume that 
there exists positive constants c\, C2 > 0, to such that 



B 2 n >c\n, Ee to v\ Xil <c 2 for 1 < i < n. (4.19) 

Then 

Pi i - W) x) = 1 + + * 3)/VH (4 - 20) 

for < x < toc^ 2 ?i 1 / 6 /4 ; where 0(1) is an absolute constant depending on c 2 
and to . In particular, we have 

P{S n /B n > x) 



1 - 

uniformly in < x < o(n 1 / 6 ). 



(4.21) 



Proof of Proposition 14.61 The main idea is first truncating and then 
applying Proposition 14.51 to the truncated sequence. W.l.o.g., assume c\ = 1. 
Let 

n 

Xi = Xil{\Xi\ < n 2 / 3 ), S n =J2**- 

i=i 

Observe that 

\P(S n /B n >x)-P(S n /B n >x)\ 

n 

i=l 
n 

< ^ e - t0 " 1/3 £;e t0 ^l < c 2 ne' t0 " 1/3 = 0(1)(1 - $(s))(l + .t 3 )/V^ 
i=i 

for < a: < cit ™ 1/6 /4. Now let & = {X—EX^/B^ where B 2 = £ ? n =1 Var(X;). 
It is easy to see that 



Y^\EXi\ < 5>LY 4 |l(|X t | > n 2 / 3 ) 

i=l i=l 
n 

< ^ne-* " 1/3 ^e*«v^ = o ( n -2) ( 4 . 22 ) 



i=l 
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and similarly, B n = B n (l + o(r7T 2 )). Thus, for < x < £ n 1/6 / 4 

x\Zi\ < —\Xt\lQXi\ < n 2 / 3 ) + o(l) < -^^\X~\+ (1) < t ^/\X~\/2 + o(l) 

and hence 7 = 0(n~ 1 ^ 2 ). Applying Proposition 14.51 to {£,,1 < i < n} gives 

Remark 4.1. As has been remarked for (| 1. i[) m i/ie Introduction the range 
< x < toc\ '" ! n 1 ■■ ^ 6 / '4 and £/ie order of the error term 0(1) (1 + x 3 )/y/n in 
Proposition \4-6\ are optimal. By comparing with (j the results in the four 
examples discussed above may be optimal. 

5. Preliminary Lemmas 

To prove Theorem I3.lt we first need to develop two preliminary lemmas. Our 
first lemma gives a bound for the moment generating function of W. 

Lemma 5.1. Let W be a random variable with E\W\ < C. Assume that there 
exist S > 0, 5x > 0,0 < S 2 < 1/4 and 9 > 1 such that pj|) . (pOj) - are 
satisfied. Then for all < t < 1/ (2(5) satisfying 

tS 1 +C a t65 2 < 1/2 (5.2) 

I 12 under the first alternative of Qff.-^P 

1 2 ^ + ^ - under the second alternative of \3.4\ 

we have 

Ee tw < cxp(t 2 /2 + c (t)) (5.4) 

co(*) = ci(C, C a )0{<M + ^t 2 + (6 + S 1 + 6 2 )t 3 } (5.5) 
where c\(C,C a ) is a constant depending only on C and C a . 
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Proof. Fix a > 0, t e (0, 1/(25)} and s e (0,t] and let f(w) = e s(wAa \ Letting 
h(s) = Ee s ^ WAa \ firstly we prove that h!(s) can be bounded by sh(s) and 
EW 2 f(W). By (JHU), 



h'(s) = E(W A a)e s[WAa) < E(Wf(W)) 
= E J f(W + t)dft(t)+E(Rf(W)) 

= sE J e s( - w+t ^I(W + t < a)dfi(t) + E(e s< - WAa ^ E(R\Wj) 

< sE J e s[(w+t)Aa] dfi(t) + E(e s(WAa ^ E{R\W)) 

< sE J e s{WAa+5) d(i{t) + E(e s{WAa) E{R\W)) 

= sE J e s{WAa) djl{t) + sE J e s{WAa \e sS - l)dfi(t) + E(e s{WAa) E(R\W)) 

< sEe s{WAa) D + sEe s{WAa) \e sS - 1\D + 2S 2 E((l + W 2 )e s(WAa) ), 

where we have applied (|3.2I) and (|3.4[) to obtain the last inequality. Now. apply- 
ing the simple inequality 

|e* - 1| < 2|a;| for \x\ < 1 , 

and then (13.31). we find that 



E(Wf(W)) < S Ee s{WAa) D + sEe s{WAa) 2s5D + 25 2 E((l + W 2 )e^ WAa) ) 

< sEe s< - WAa ^E{D\W) + 2s 2 0SEe s ( WAa ) + 25 2 E{{\ + W 2 )e s{WAa) ) 
= sEe s(WAa) + sEe s{WAa) [E(D\W) - 1] 

+2s 2 eSEe s< - WAa ^ + 25 2 E((1 + W 2 )e s{WAa) ) 

< sEe s(WAa) + s5xEe s{ - WAa \l + \W\) + 2s 2 95Ee s{ - WAa ^ 
+25 2 E{{l + W 2 )e s{WAa) ). 



Note that 



E\W\e s{WAa) = EWe s{WAa} + 2EW~e^ WAa ^ 
< E(Wf(W)) + 2E\W\ < 2C + E(Wf(W)). (5.6) 
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Collecting terms we obtain 

h'{s) < E(Wf(W)) (5.7) 
< {(s(l + 5-l + 2t68) + 28 2 )h{s) + 2S 2 EW 2 f(W) + 2Cs5 l }/{\ - s8{). 

Secondly, we show that EW 2 f(W) can be bounded by a function of h(s) and 
h'(s). Letting g(w) ~ we s ( wAa \ and then arguing as for (|5.7j) . 

EW 2 f(W) = EWg(W) (5.8) 
= E J ^[(w+t)Aa\ + s , w + t yKfr+t)*a]j( W + 1 < a ^fi{t) + E(RWf(W)) 

< E J ( e s ( WA ^e sS + s[{W + t) A a]e< WAa ^e s5 )djl{t) + E(RWf(W)) 
= e sS E(f(W) + sf(W)((W A a) + S))D + E(RWf(W)) 

< 6»e°- 5 (l + 0.5)Ef(W) + s8e sS E(W A a)f(W) + E(RWf(W)) 

< l.5e°- 5 dh(s) + 2s9h'(s) + E(RWf(W)). 

Note that under the first alternative of (|3.4I) . 

\E(RWf(W))\ < 5 2 Ef(W) + 2S 2 EW 2 f(W), (5.9) 
and under the second alternative of (|3.4I) . 

\E{RWf(W))\ < aEf(W) + aEW 2 f(W). (5.10) 
Thus, recalling 5 2 < 1/4 and a < 1, we have 

EW 2 f(W) < ^-(6h(s) + sOti(s)) (5.11) 
where C a is defined in (|5.3p . 
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We are now ready to prove (|5.4[) . Substituting (|5.11j) into (|5.7|) yields 

(l-s6i)h'{a) < (s(l + S ± + 2t9S) + 2S 2 )h(s) 

+6 2 C a (8h(s) + s9h'(s)) + 2CsS 1 

= (s(l + S 1 +2t.85)+ 25 2 (l + C a 6))h(s) 
+C a s68 2 h'{s) + 2Cs5i 

< (s(l + 5 1 +2t95)+ 25 2 (l + C a 9))h(s) 

+C a t9S 2 h'(s) + 2Cs5i. (5.12) 

Solving for h'(s), we obtain 

h'(s) < (sci(t) + c 2 (t))h(s) + 1 2CS6 ' , (5.13) 

1 ~ c 3 (t) 

where 

1 + Si + 2t9S 

Cl{t) = 1-CaW ' 

25 2 {\ + C a 9) 

c 2 {t) = — 7—, 

1 - c 3 (i) 

c 3 (t) = tSi + C a t95 2 . 

Now taking t to satisfy (|5.2[) yields c 3 (i) < 1/2, so in particular Ci(t) is nonncg- 
ative for i = 1,2, and 1/(1 - c 3 (i)) < 1 + 2c 3 (i). 
Solving (|5.13p . we have 

t 2 

h{s) < exp(— d(t) + tc 2 (t) + 2Cb x t 2 ) (5.14) 

Note that c 3 (i) < 1/2, 8 2 < 1/4 and 6> > 1. Elementary calculations now give 

j(c 1 (t)-l)+tc 2 (t)+2CS 1 t 2 

t 2 6 1 +2t9S + c 3 (t) 2tS 2 (l + C a 9) 

— ~7 : 77 77 r ZOuiI 

2 l-c 3 (t) l-c 3 (t) 

< t 2 ((5i + 2t95 + Mi + CUfl^) + US 2 (l + C Q ) + 2C5 l t 2 

< c (t) 

imsart-generic ver. 2006/10/13 file: ll-5-16.tex date: May 30, 2011 



L.H.Y. Chen, X. Fang and Q.M. Shao/ 'Moderate Deviations 18 

and hence 

t 2 Cl (t)/2 + tc 2 (t) < t 2 /2 + co(t), 
thus proving (|5.4p by letting a — > oo. | 

Lemma 5.2. Suppose that for some nonnegative 8,8\ and 82 satisfying max(o", 81,82) < 
1 and > 1 Q5.^[ ) is satisfied, with co(t) as in (| 5. 5|) . /or aZZ 

t€ [O^-^nin^- 1 / 3 ,^ 173 ,^ 173 )]. (5.15) 

T/ien /or integers k > 1, 

/ u fc e" 2 / 2 P(W > u)du < c 2 (C, C Q ) £ fc (5.16) 
Jo 

where C2{C,C a ) is a constant depending only on C and C a defined in Lemma 

no 

Proof. For t satisfying (|5.15[) it is easy to see that co(t) < 5c\(C,C a ) where 
ci(C, C a ) is as in Lemma WA\ and ()5.2[) is satished. Write 

rt 

/ u k e u2/2 P(W >u)du 
Jo 

= / u k e u2/2 P(W >u)du+ u k e u2/2 P(W >u)du, 
Jo J[t] 

where [t] denotes the integer part of t. For the first integral, noting that sup J _ 1< „<j e" 2 / 2- ^" = 
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e (i-l) 2 /2-i(i-l) 5 wc hayc 
t] 

u k e u2/2 P(W > u)du 

o 

W ri 

< J2 jk e ll2/2 - ju e ju P(W >u)du 
i=i ^ 

[t] />j 

< ^/eW- 1 ) 2 / 2 ^'^- 1 ' / e ju P(W>u)du 

[*] ,00 

< 2j2j k e~ f2/2 e lu P{W>u)du 
j=l J~oo 

[t] 

= 2^/< < j)Ee^ W 
i=i 

[*] 

< 2^/- 1 exp(-j 2 /2 +.f/2 + co(i)) 
3=1 

W 

< 2e c °( t )^/- 1 

< c 2 (C,C Q )t fe . (5.17) 



Similarly, we have 



* 

u k e u2/2 P(W > u)du 

A 

ft 

< t k / e u2/2 - tu e tu P(W > u)du 

J[t] 

< t k eW 2 / 2 -W f e tu P(W>u)du 

J It] 

/CO 
e tu P(W > u)du 
-OO 

< c 2 {C,C a )t k . 



This completes the proof. 
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6. Proofs of results 



20 



In this section, let O a (l) denote universal constants which depend on a only 
under the second alternative of (|3.4p . 

6.1. Proof of Theorem HOI 

If 0- 1 mintf- 1 / 3 , 6i 1/3 ,52 1/S ) < O a (l), then 1/(1 - $(x)) < 1/(1 - $(O a (l))) 
for < x < O a (l). Moreover, 6 3 {5 + S 1 + S 2 ) > O a (l). Therefore, (JSU) is trivial. 
Hence, we can assume 

6- 1 mintf- 1 / 3 , 6- 1/3 ,6- 1/3 ) > O a (l) (6.1) 

so that 5 < 1,62 < 1/4, Si + 2S2 < 1, and moreover, <$i + <$2 + a < 1 under the 
second alternative of (|3.4[) . Our proof is based on Stein's method. Let / = f x 
be the solution to the Stein equation 

wf(w) - f'(w) = I(w > x) - (1 - $(x)) (6.2) 

It is known that 

m = 



< 



\ V2^e w / 2 (1 - $(to))$(x), to>x 
I x/^e 1 " 2 / 2 ^ - $(x))$(w), to < x 

- — ; -l(w >x) + 3(1 - $(x))e M,2/2 l(0 < to < x) 

+4(l-<I>(x))— L-l(w<0) (6.3) 
1 + |to| 

by using the following well-known inequality 

(1 - <P(w))e w2/2 < min (i — ^=), to > 0. 

2 toy 27T 

It is also known that to/ (to) is an increasing function (see Lemma 2.2, Chen and 
Shao (2005)). By ([311]) we have 

£W/(W0 - ERfiW) = E J fiW + t)dm (6.4) 
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and monotonicity of wf(w) and equation (|6.2|) imply that 

f(W + 1) < (W + 6)f(W + S) + 1 - $0) - 1(W >x + S) (6.5) 

Recall that J djl(t) = D. Thus using non- negativity of \x and combining (|6.4[) . 
(|6.5I) we have 

£W/(W) - ERf(W) 

< E J((W + S)f(W + 5)- Wf(W))dfi(t) + EWf(W)D 

+E J {1 - - 1(W > x + 6)}dfj,(t), (6.6) 

Now, by (|3.2[) . the expression above can be written 

+ 6)f(W + 6)- Wf(W))D 
+EWf(W)D + E{1 - - 1(W > x + 5)}D 
= 1 - $(x) - P(W > x + 5) 

+E{{W + 5)f{W + 5)- Wf(W))D + EWf{W)D 
+E{l-^(x)-l[W >x + 5)}(D-l). (6.7) 

Therefore, we have 

P(W >x + 5)-(l- $(x)) 

< E((W + 5)f(W + 5)- Wf{W))D + EWf(W)(D - 1) 
+E{1 - - 1(W > x + S)}(D - l)+ERf{W) 

< 6E((W + S)f{W + 5)- Wf(W)) + 5xE(\W\{l + \W\)f{W)) 
+5iJ5|l - <P(x) - 1(W >x + S)\(l + \W\) + S 2 E{2 + W 2 )f(W)(6.8) 

where we have again applied the monotonicity of wf(w) as well as (|3.5p . (|3.3p 
and (|3.4p . Hence we have that 

P(W >x + S)-(l- $(x)) < 61/! + tfi/a + 5x13 + <5 2 / 4 , (6.9) 
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where 

h = E((W + 5)f(W + 6)-Wf{W)l 

h = E{\W\{l + \W\)f{W), 

h = E\l - $(» - 1(W > x + 6)\(l + \W\) and 

h = E{2 + W 2 )f{W). 
By (|6.3p we have 

Ef(W) < iP(W > x) + 4(1 - $(x)) 

+3(1 - <I>(x))Ee w2/2 l(0 < W < x). (6.10) 
Note that by ([37TJ) with /(?«) = w, 

EW 2 = E J dp,(t) + E(RW) 
= ED + E(RW) 

Therefore, under the first alternative of (pH)) , EW 2 < (1 + 2(5i + <5 2 ) + (<5i + 
2S 2 )EW 2 1 and under the second alternative of ([33, £W 2 < (1 + 26 1 + S 2 ) + 
(Si + S 2 + a)EW 2 . This shows EW 2 < O a (l). Hence the hypotheses of Lemma 
15. H is satisfied with C = O a (l), and therefore also the conclusion of Lemma 15. 21 
In particular, 

Ee w2/2 1(0 < W < x) < P(0 < W < x) + / ye y '/ 2 P(W > y)dy 

Jo 

< O a (l)(l + a:). (6.11) 

Similarly, by (|6.3[) again 

EW 2 f(W) < 4E\W\1(W > x) + 4(1 - &(x))E\W\ 

+3(1 - <I>(x))EW 2 e w2/2 l(0 < W < x) 

and by Lemma 15.21 

EW 2 e w2/2 l(0 < W < x) < (y 3 + 2y)e y2/2 P(W > y)dy 

Jo 

< O a (l)(l + x 3 ) (6.12) 
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As to 

E\W\1(W >x)< P(W >x) + EW 2 I(W > x), 
it follows from Lemma T5. II that 

P(W > x) < e~ x2 Ee xW = O a (l)e-^ /2 (6.13) 

and 

/•OO 

/ tP{W > t)dt 

J X 

< Ee xW / te~ xt dt 

J X 

= Ee xW x- 2 (l + x 2 )e~ x ' 2 < O a (l)e' x2 / 2 x~ 2 (l + x 2 ) 

< O a {l)e- x2 ' 2 (6.14) 

for x > 1. Thus, we have for x > 1 

p OO 

£W 2 1(W > x) = x 2 P(W > x) + / 2yP(W > y)dy 

J X 

< O a (l)(l + x 2 )e~ x2 l 2 < O a (l)(l + x 3 )(l - $(x)|6.15) 

Clearly, (j6.15[) remains valid for < x < 1 by the fact that EW 2 1(W > x) < 
EW 2 < 2. Combining (|6~TT|) - (|H7T5|> . we have 

/ 2 <O q (1)(1 + x 3 )(1-$(.t)). (6.16) 

Similarly, 

/ 4 <O a (l)(l + x 3 )(l-$(x)) (6.17) 

and 

h < (l-$>(x))E(2 + W 2 )+E(2 + W 2 )l(W > S + x) < Q (1)(1 + x 3 )(l - $(x)). 

(6.18) 

Let = (wf(w))'. Then 7i = J Q Eg(W + t)dt. It is easy to see that (for 

example, Chen and Shao (2001)) 

f (\/2^(l + w^e 1 " 2 / 2 ^ - $(w)) -tyWx), 10 > x 

9H = { K , 2/ s ' (6-19) 

[ (V2n(l + w 2 )e w i 2 <$>(w) +w)(l - $(x)), iu < x 
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and 

< V2tt(1 + w 2 )e w2/2 (l - $(iu)) - w < - 2 , , (6.20) 

1 + ur 

we have for < t < 6 

Eg(W + t) (6.21) 

= Eg(W + t)l{W + t > x} + Eg(W + t)l{W + t < 0} 

+£.g(VK + i)l{0 < W + t < x} 

< 2 ., P(W + t > x) + 2(1- $(x))P(W + t <0) 
1 + a;- 3 

+V2^(1 - + (W + tf + (W + t))e( M/+ *' 2 / 2 l{0 < W + t < a;}} 

= O«(l)(l + o: 3 )(l-$(x)) 

and hence 

h = O a (l)8(l+x 3 )(l-<S>(x)). (6.22) 
Putting (jUJ), ([536]) . l[6Tf|) . (|67T8|) and (j6~22"T) together gives 

P(W >x + S)-(l- < O a (l)(l - $(a;))0(l + ^ 3 )(<5 + ft + <5 2 ) 

and therefore 

P(W >x)-(l- $(ar)) < O q (1)(1 - $(a:))0(l + a; 3 )(<5 + ^ + <J 2 ). (6.23) 
As to the lower bound, similarly to (|6.5[) and (|6.8[) . we have 

/'(W + t) > (W - S)f(W - 8) + 1 - $(x) - 1(W > x - S) 

and 

P(W > x- S) - (1 - $(.t)) 

> 0E((W - <f)/(W - 6) - Wf{W)) - 6iE(\W\(l + |W|)/(W0) 
-SiEh- - 1(W > x -5)1(1 + \W\) - 5 2 E{2 + W 2 )f{W) 
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Now follwoing the same proof of ()6.23|) leads to 

P(W > x) - (1 - > -O a (l)(l - S(ar))0(l + z 3 )(<5 + ^ + <5 2 ). 

This completes the proof of Theorem 13.11 | 

6. 2. Proof of Proposition \4-2\ 

For n > 2, X ~ C/{0, 1, . . . , n}, let £„ = S*(X) be the number of l's in the binary 
string of X generated in any system of binary codes satisfying (|4.7p . Without 
loss of generality, assume that 

S*(0) = 0. (6.24) 

The condition (|4.7[) allows S(X) to be represented in terms of the labels of the 
nodes in a binary tree described as follows. Let T be an infinite binary tree. For 
k > 0, the nodes of T in the fcth generation are denoted by (from left to right) 
(14,01 • ■ • i ^4.2 fc -i)- Each node is labeled by or 1. Assume T satisfies 

CI. The root is labeled by 

C2. The labels of two siblings are different. 

C3. Infinite binary subtrees of T with roots {Vfc.o ■ k > 0} are the same 
as f. 

For 2 k ~ 1 — 1 < n < 2 k — 1, represent 0, . . . , n by the nodes 14. o, ■ • ■ , Vk,n respec- 
tively. Then S(X) is the sum of l's in the shortest path from Vk,x to the root 
of the tree. The condition C3 implies that S(X) does not depend on k so that 
the representation is well defined. 

We consider two extreme cases. Define a binary tree T by always assigning 
to the left sibling and 1 to the right sibling. Then the number of l's in the binary 
string of X is that in the binary expansion of X. Denote it by S n (= S(X)). Next, 
define a binary tree T by assigning Vk t o = 0, 14. i = 1 for all k and assigning 1 
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to the left sibling and to the right sibling for all other nodes. Let the number 
of l's in the binary string of X on T be S n (= S(X)). Both T and T are infinity 
binary trees satisfying CI, C2 and C3 and both S n and S n satisfy (|4.7jl . It is 
easy to see that for all integers n > 0, 

S n <st S n < s t S n (6.25) 

where < s t denotes stochastic ordering. Therefore, it suffices to prove Cramer 
moderate deviation results for W n and W n where W n — (S n — |)/^/| an d 
W n = (S n — f)/yf • We suppress the subscript n in the following and follow 
Diaconis (1977) in constructing the exchangeable pair (W, W). Let Z be a ran- 
dom variable uniformly distributed over the set {1, 2, • ■ ■ , k} and independent 

of X, and let the random variable X 1 be defined by 

fe 

X' = J2xi2 k -\ 

i=l 

where 

(Xi if i^I 

Xl = ( 1 if j = /,l; = 0andl + 2 fc - / < n (6.26) 
[o else. 

Let S' = S - X! + X'j, W = (S' - k/2)/yfk/l. As proved in Diaconis (1977), 
(W, W) is an exchangeable pair and 

E(W - W'\W) = X(W - (- E( - Q jl V 1 )) (6.27) 

\Jk 

±E((W - W' fW) - 1 = -B9p. t (6 .28) 

where A = 2/k and Q = Y% =1 1(Xi = 0,X + 2 k ~ i > n). From LcmmaOand 
Theorem O (with 5 = O^" 1 / 2 ),^ = 0{k- 1 ),S 2 = Oik- 1 / 2 )), 

for < x < fc 1 / 6 . Repeat the above argument for — W, we have 
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for < x < fc 1 / 6 . 

Next, we notice that S and S can be written as, with X ~ JJ {0, 1, . . . , n}, 
S = 1(0 < X < - 1)5 + I(2 k - 1 <X <n)S 

and 

S = 1(0 < X < 2 k ~ 1 - 1)5 + I(2 k - 1 <X< n)S. 

Therefore, 

-W-' 

sfk/l 

-\ + 1(0 < X < 2 fe -! - l)(*=i - S) + I(2 k - 1 < X < - S) 



and 

-| + 1(0 < X < 2 k ~ 1 - l)(S - h=k) + I(2 k - 1 < X < n)(S - ^ 



W 



Conditioning on < X < 2 fe " 1 - 1, both the distributions of S(X) and S(X) 
are Binomial(fc — 1, 1/2), which yields 

£(^yi - S\0 < X < 2 k ~ 1 - 1) = C(S - ^^|0 < X < 2 k ~ l - 1). 

On the other hand, when 2 k ~ 1 < X < n, S(X) = k - 1 - S(X). Therefore, 
W has the same distribution as — W ~ which implies Cramer moderate 

deviation results also holds for W. Thus finishes the proof of Proposition 14.21 

Lemma 6.1. We have E(Q\S) = 0(l)(l + \W\). 

Proof. . Write 



n = 

i>i 



with 1 = pi < P2 < • • • < Pk! the positions of the ones in the binary expansion 
of n, where k\ < k. Recall that X is uniformly distributed over {0, 1, • • • , n}, 
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and that 



k 



i=l 

with exactly S of the indicator variables X\ , . . . , Xk equal to 1 . 



We say that X falls in category i, i = !,••• ,ki, when 



X, 



= 1 and Xp ; = 0. 



(6.29) 



We say that X falls in category k\ + 1 if X = n. This special category is 
nonempty only when S = k\ and in this case, Q = k — ki, which gives the last 
term in (|6.30|) . 

Note that if X is in category i for i < ki, then, since X can be no greater 
than n, the digits of X and n match up to the pf 1 , except for the digit in place 
Pi, where n has a one, and X a zero. Further, up to this digit, n has pi — i zeros, 
and so X has a; = pi — i + 1 zeros. Changing any of these otj zeros except the 
zero in position pi to ones results in a number n or greater, while changing any 
other zeros, since digit pi of n is one and of X zero, does not. Hence Q is at 
most cti when X falls in category i. Since X has S ones in its expansion, i — 1 
of which are accounted for by (|6.29p . the remaining S — (i — 1) are uniformly 
distributed over the k — pi = k — (i — 1) — a.; remaining digits {X Pi+ i, • ■ • , X^}- 
Thus, we have the inequality 



and 1 = a\ < ai < 0,3 < ■ • • . 

Note that if k\ = k, the last term of (|6.30[) equals 0. When ki < k, we have 




(6.30) 



where 





(6.31) 
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so wc omit this term in the following argument. 
We consider two cases. 

Case 1: S > k/2. As a» > 1 for all i, there are at most k + 1 nonzero 
terms in the sum (|6.30|) . Divide the summands into two groups, those for which 
a,i < 2 log 2 k and those with a; > 2 log 2 k. The first group can sum to no more 
than 21og 2 k. because the sum is like weighted average of aj. 

For the second group, note that 



k — (i — 1) — a, 
S-(i-l) 



/A 



fh — - a.A ,(k-l 

- \ S-(i-l) )'{ S 

'k - S - i\ ( <?- i 

° J \ TT / J 



II. 'II 



k — j J \ k — (ai — 1) — 1 — j 
j=l v J / j=o v v ; J 



where the second inequality follows from S > fc/2, and the last inequality from 
ai > 2 log 2 fc. Therefore, the sum of the second group of terms is bounded by 1. 

Case 2: S < k/2. Divide the sum on the right hand side into two groups 
according as to whether i < 2 log 2 k or i > 2 log 2 k. Clearly, 

< 1/2 1 - 1 

using the assumption S < k/2 and the fact that S > i — 1. The above inequality 
is true for all i, so the summation for the part where i > 2 log 2 k is bounded by 
1. 

Next we consider i < 21og 2 /c. When S > k ^ 1 ° ga 1 ' + 21og 2 fc, we have 
^( fej^pr ) ^ 1 < 1- Solving S from the inequality a t ( ) a '~ 1 < 1, 

_ log aj _ log 

we see that it is equivalent to the inequality S > (1 — e "i" 1 )fc — 1 + e a i _1 i, 
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which is a result of the above assumption on S when i < 2 log 2 k. Now we have 



k — (i — 1) — a, 
S-(i-l) 



/A 



- a ^ s -(i-i) y{ s 



a, 

3- 



S-J \Tr7 h-S-j 



1 / k-S-l 1 



* ^ U-(»-D-l J "2^ ( ° 3) 
using the fact that fl^( fc jvL-J '"" 1 < 1. 

On the other hand, if S 1 < fc (^^) +21og 2 fc then aiS/(k-l) = 0(1) log 2 fc, 
which implies 

'fc — (i — 1) — a»\ . . ai 5 tt / S — j \ a -w-j f k — S — j 



' A <- r^n n 



= 0(1) log 2 */2 i_2 . 

This proves that the right hand side of (|6.30|) is bounded by O(l) log 2 k. 

To complete the proof of the lemma, i.e., to prove E(Q\W) < (7(1 + \ W\), we 
only need to show that E(Q\S) < C for some universal constant C when \W\ < 
log 2 fc, that is, when k/2 — y/k/4log 2 k < S < k/2 + ^k/4\og 2 k. Following 
the argument in case 2 above, we only need to consider the summands where 
i < 2 log 2 k because the other part where i > 2 log 2 k is bounded by 1 as proved 
in case 2. 

When dj, k are bigger than some universal constant, k/2 — \fkjA log 2 k > 
to x k + 21og 2 k, which implies ( fc ffiff-i )"*" 1 xa,<l and (^l^ *) X 
< 1/2* -1 . Since both parts for i < 2 log 2 fc and i > 2 log 2 fc are bounded by 
some constant, E(Q\S) < C when \W\ < log 2 k and hence the lemma is proved. 
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6. 3. Proof of Propositions \4-3\ and \4-4\ 

Let W have the conditional distribution of W (W\ , W 2 respectively) given | W\ < 
c\\fn (| W\ I, \ Wi I < c\\fn respectively) where c\ is to be determined. If we can 
prove that 

ffi^ = l + 0(l)(l+x 3 )/VH (6.34) 
for < x < 77, 1/6 , then from the fact that (Ellis (1985)) 

P(\W\ > Ky/n) < e- nC{K \ (6.35) 

and 

P{\Wx\ > K^\S < 0) < e~ nC{K \ P(\W 2 \ > KVn~\S > 0) < e ~ nC{K ^ 

for any positive number K where C(K) is a positive constant depending only 
on K, we have, with 8 2 = 0(l/y/n), 

P{W >x) < P(W >x) + P{S 2 \W\ > 1/2) 
1 - ~ 1 - 

= l + 0(l)(l + a; 3 )/V^ 

for < a; < n 1 / 6 . Similarly, (|4.15[) and (|4. are also true. Therefore, we prove 
Cramer moderate deviation for W (still denoted by W in the following) defined 
below. Assume the state space of the spins is E = (u\,o' 2l . . . ,a n ) e { — 1, 1}™ 
such that y^"_ 1 CTj/n £ [a, b] where [a, b] is any interval within which there is only 
one solution m to gU). Let S = £" =1 <*i, W = and cr 2 = rij^^. 

Note that in Case 1 and Case 2, 1 — (1 — m 2 )/3 > 0, thus a 2 is well defined. 
Moreover, [a,b] is chosen such that \W\ < C\y/ri. The joint distribution of the 
spins is 

Z p,h e M +/3h^a l ). 

1=1 



imsart-generic ver. 2006/10/13 file: ll-5-16.tex date: May 30, 2011 



L.H.Y. Chen, X. Fang and Q.M. Shao/ 'Moderate Deviations 32 

Let I be a random variable uniformly distributed over {1, • • ■ , n} independent 
of {(Ti, 1 < i < n}. Let a[ be a random sample from the conditional distribution 
of cr, given {aj,j ^i,l<j< «}• Define W = W- (07 - 0/)/cr. Then (W, W) 
is an exchangeable pair. Let 

cxp(— /3(m + ft) — (iaw/n + /3/n) 
exp(— f3(m + ft) — ftaw/n + /3/n) + cxp(/3(m + ft) + f3ow/n — (3/n) ' 

and 

„, s exp(/3(m + ft) + \3owln + /3/n) 

exp(/3(m + ft) + /3crw/n + /3/n) + exp(— /3(m + ft) — (3aw/n — (3/n) 

It is easy to see that 

p— /3(m+fa) — 0aw/n 



g— /3(m+/i) — ftaw/n _|_ g/3(m+fo)+/3<Tu;/n 

exp(— /3(m + ft) — /3aw/n) 



< A(w) 



cxp(— /3(m + ft) — pcrw/n) + exp(/3(?n + ft) + (3aw/n — 2/3/n) 

„ — @{m+h) — (3 aw j n 

< ! e 2/3/ 

g— /3(m+/i) — fiawfn _j_ ^/3(m-\-h)-\-j3aw /n 



and 



gfi{rn-\-h)-\-f3(jw Jn i ^—j3(m-\-h)—j3<7w/n 

cxp(/3(m + /i) + fiaw/n) 



< B(w) 



exp(/3(m + ft) + P<jw/n) + exp(— /?(m + ft) — flaw/n — 2/3 /n) 

„(3(m-\-h)-\-/3(rw j n 
g J 9(m+/i)+/3cT'iij/n _j_ g — ft(rn-\-h)—/3<Tw/n 



< s, ^ : , it, — a , e 2fi ' n 



Therefore 



and 



A(W)+B(W) = 1 + 0(1)- 

n 



A(W) ~ B(W) =-ta,nh(p(rn + h)+PaW/ri)+0(l)-, 

n 
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Note that 



E(W-W'\S) 

-E{a I -a I \S) 
a 

-E(I(a! = l,a' r = -1) - I(a! = -1,^ = 1)|E) 
a 

±- — - 2 > an) B(W)I(S + 2<bn) 

a 2n a 2n 

(A(W) + B(W))(— + -) + ~{A(W) - B{W)) 
n a a 

aW + nm + n , TTTST/ „ „ . n — aW — nm _ /Tir , r , „ „ , . 

A(W)I(S - 2 < an) H B(W)I(S + 2 > 6n) 

an an 

W m 11 BaW 1 

— + - 1 + -) - -(tanh(/3(m + + + -) 

n a n a n n 

_ 5 + n ^ _ 2 < + + 2 > 6 x 

cm en 

A(W-i2) 



where 



A = 1 - (1 - ro ^>0 



and 



R = ltanh" (/3(TO + fe) + Ql* w2 + 1 £±Ij A(w)J(g _ 2 K an) 
A 2n^ A an 

-\ r ^l B {W)I{S + 2 > bn) + 0(1)(— + -) 
A ern n cr 

where £ is between and /3aW/n. Similarly, 



E((W- W) *\E) 

= -^E(I((Ti = l,a'j = -l) + I{a I = -l,a'j = 1)|E) 

= ^ + 0(1)^ + 0(A) + 0( *(S - * < on or S + 2 > H } 

eH n<7 ncH cr 2 



Therefore, recall that a 2 = n-. — }. — a , „ , 

1 — (1— m 2 )p ' 



|£(D|W)-1|<0(4=)(1 + |W|). 
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For R, with 6 2 = 0(l/y/n), 

\E(R\W)\ < 6 2 (l + W 2 ), 

and if c\ is chosen such that t^l^l < 1/2, the second alternative of (|3.4p is 
satisfied with a — 1/2. Thus from Theorem 13 . 1 1 we have the following moderate 
deviation result for W 

ffiJpi = 1+0(1X1+^)4= 

for < x < n 1/6 . This completes the proof of (|4J2j) and (|4T5]) . | 
6-4- Proof of Proposition [4-5\ 

. Since (1 - $(x)) > 2 {i+x) e ~ x2/2 for x - °' (l4 - 18l) becomes trivial if X7 > 1/8. 
Thus, we can assume 

X7<l/8. (6.36) 

Let f = f x be the Stein solution to equation (|6.2[) . Let = W — £j and 

ifi(i) = E^(I{0 < i < &} - J{6 < * < 0}). It is known that (see, for example, 
[(2.18) in Chen and Shao (2005)]) 

EWf(W) = J2 E f'(W {l) + t)Ki(t)dt. 

i=i 

Since Ki(t)dt = E£f, we have 

P(W >x)-(l- $(x)) 
= EWf(W) ~ Ef(W) 

= J2 E (f'( W(i) + *) " f(W))Ki(t)dt 
i=i 

n poo 

= J2 E ^ W(l) + *)/(^ W + *) " Wf{W))Ki{t)dt 
i=i ■ 1 ~ ca 

n roo 

+ H E (I{W {t) +t>x}- I{W > x})Ki{t)dt 
1 1 J— 00 



i=l 



:= R1+R2, (6.37) 
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It suffices to show that 

|-Ri| <C(l+x 3 ) 1 (l-^(x))e x3 - f (6.38) 

and 

\R 2 \ < C(l + x 2 ) 7 (l - $(x)) e x '\ (6.39) 
To estimate let g(w) = (wf(w))'. It is easy to see that 

R 1 =^E I I g(W {l) + s)dsK t (t)dt. (6.40) 
i=i J 

By (|6~19)) and f63D|l . following the proof of (|63Tj) . we have 

+ s) (6.41) 

= Eg{W [i) + s)I{W (l) +s> x}dt + Eg{W (l) + s)I{W (l) + s < 0} 
+S 9 (VK (i) + s)I{0 < Vl/ (l) + s < .t} 

< ^jP^C + s > x) + 2(1 - $>{x))P(W {l) + s < 0) 

+V2tt(1 - $(x))^{(l + (VK W + s) 2 )e (wit)+s)2/2 I{0 < W {l) + s < x}} 

< — ^P(W {l) > x -s) + 2(1 - <$>{x))P(W (l) + s < 0) 
1 + x 3 

-V2^(l - *(a:)) / {l + y 2 )e y2/2 dP(W {i) + s> y) 



7 P(W {l) >x-s) + 2(l- <$>{x))P{W (l) + s < 0) 



1 + x 3 

+V2k{1 - $(x))P(W w + s > 0) + \/27r(l - $(a;)) J(s) 
< 1 2 ;c 3 -P(Ty W > as - s) + V2n(l - $(a;)) + v^tt(1 - $(x))J(s), 
where 

J(s) = / (3y + y 3 )e y2/2 P(VK (l) + s > (6.42) 



Clearly, for < t < x 



Ee^ = l + t 2 E£ 2 /2 + Y^ 



k\ 

fc=3 
+3 

< l + t t E$/2 + —E\£ j \ a e t ™\ 

< exp(t 2 ^|/2+^£|0| 3 e^l) 
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and hence 

3 

Ee t { w^+s) < exp ( t 2/ 2 + x | s | + £. 7 j for o < t < x ( 6 43) 

By (|6 .43[) , following the proof of Lemma 15.21 yields 

J(s) < C (1 + x 3 )e xS ' 1+x ^ (6.44) 

Noting that (|6.43l) also implies that 

P{W {i) >x-s) < e~ x2 Ee x ^ wW+s) < exp(-x 2 /2 + x\s\ + x 3 j) 
< (1 + x)(l - exp(x|s| + x 3 j) 

we have 

Eg(W (l) +s)< C(l + .x 3 )(l - $(x))e x ^ +x W 
and therefore by (|6.40p 

\Ri\ < Y, E / + s ) d s\Ki(t)dt 

i=l J-oo J£i 

< C(l + x 3 )(l-^>(x))e x ^^2 E (\t\e x W + \Zi\e x ^)Ki(t)dt 

i=i "'- 00 

< C*(l + x 3 ) 7 (l-$(x))e a;37 (6.45) 
This proves (|Q51) . 

As to i?2, we apply an exponential concentration inequality of Shao (2010) 
(see Theorem 2.7 in [30]): for a > and b > 

P(x - a < W (l) <x + b) 

< Ce x ^ +xa - x2 ^ + b + a)E\W (i) \e xwW + {Ee 2xwW ) 1/2 exp(- 7 - 2 /32)) 



Ce 



x-y-\-xa—x' 



(( 7 + 6 + a)(W<V w< ° + l)(Ee 2xWit) )^ 2 cxp(- 7 - 2 /32)) 



< Ce^ +m - x2 (( 7 + b + o)((l + x)e x2/2+3;37 + e x2+ ^ exp(- 7 - 2 /32)) 

< Ce* 3 ^"* 2/2 ((7 + 6 + a)(l + a:) + cxp(a; 2 /2 - 7~ 2 /32)) 

< C(l - $(x))e x ^ +xa ((j + 6 + o)(l + x 2 ) + cxp(x 2 - 7 ~ 2 /32)) , 
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Here we use the fact that EW^> e xwM < xe x ' ' l 2 + x3 ~t ) by following the proof of 
1(535)1 . Therefore 

n pOC 

R2 < P(x - & < W® < x - t \ £i)Ki(t)dt 

i=i 

< C(l - $(x))e x3 -<Y. / {(I + x 2 )E(y+\t\ + \t[ i \)e x ^+exp(x 2 -j-y 32)} Ki(t)dt 

i=i 

< C(l - <i>(x))e x ^ ((1 + x 2 ) 7 + exp(a; 2 - 7~ 2 /32)) 

< C(l - $(x))e a;37 

by (|6.36[) . Similarly, the above bound holds for — R^. This proves (|6.39l) . 
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